AITopics | bayesian uncertainty

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsDec-25-2025, 01:02:19 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, name change, simple baseline, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsMay-27-2025, 08:14:52 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, deep learning, simple baseline, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsJan-21-2025, 18:31:24 GMT

The method is almost trivially simple, scalable and easy to implement, yet the empirical evaluation shows that it performs competitively and often better than all alternatives. This is the best kind of paper! The task of representing uncertainty over model weights is highly significant -- it is debatably *the* core problem in Bayesian deep learning, with (as the authors point out) applications to calibrated decision making, out-of-sample detection, adversarial robustness, transfer learning, and more. I expect this baseline to be widely used by researchers in the field, and likely implemented by practitioners as well. The paper is well written and easy to follow.

bayesian uncertainty, deep learning, simple baseline, (5 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsJan-21-2025, 18:31:13 GMT

This paper presents SWAG, a method that uses the iterates of a Polyak-averaging-like stochastic gradient descent to approximate the posterior distribution of a neural network. It is presented as a simple baseline for uncertainty in large deep neural networks and the authors demonstrate its effectiveness on a variety of large scale tasks including residual networks on CIFAR and Imagenet. The strengths of this paper are: - it is indeed a simple baseline for a promising area of research that is really lacking good baselines - experiments are thorough and on benchmarks that are large and interesting to the wider deep learning community - the authors empirically evaluate the quality of their approximation and provide some analysis The main criticism of this paper is that it is not really Bayesian from a purist perspective. R3 is correct to point out that the presented approximation can not actually capture the true posterior as shown by Mandt et al. (Stochastic Gradient Descent as Approximate Bayesian Inference). The language of the paper at times implies otherwise and R3 is right to point this out (e.g.

bayesian uncertainty, neural network, simple baseline, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsOct-9-2024, 13:34:18 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, deep learning, simple baseline, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

Achituve, Idan, Diamant, Idit, Netzer, Arnon, Chechik, Gal, Fetaya, Ethan

arXiv.org Artificial IntelligenceFeb-6-2024

As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Running a dedicated model for each task is computationally expensive and therefore there is a great interest in multi-task learning (MTL). MTL aims at learning a single model that solves several tasks efficiently. Optimizing MTL models is often achieved by computing a single gradient per task and aggregating them for obtaining a combined update direction. However, these approaches do not consider an important aspect, the sensitivity in the gradient dimensions. Here, we introduce a novel gradient aggregation approach using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This additional valuable information allows us to quantify the uncertainty in each of the gradients dimensions, which can then be factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.

bayesagg-mtl, dataset, gradient, (13 more...)

arXiv.org Artificial Intelligence

2402.04005

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Maddox, Wesley J., Izmailov, Pavel, Garipov, Timur, Vetrov, Dmitry P., Wilson, Andrew Gordon

Neural Information Processing SystemsMar-19-2020, 02:02:23 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, deep learning, simple baseline, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Maddox, Wesley, Garipov, Timur, Izmailov, Pavel, Vetrov, Dmitry, Wilson, Andrew Gordon

arXiv.org Machine LearningFeb-7-2019

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of computer vision tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, simple baseline, swag, (12 more...)

arXiv.org Machine Learning

1902.02476

Country: